Skip to content

PendingMgsUpdates does not impl Serialize/Deserialize correctly#7992

Merged
davepacheco merged 3 commits intomainfrom
dap/serialize
Apr 17, 2025
Merged

PendingMgsUpdates does not impl Serialize/Deserialize correctly#7992
davepacheco merged 3 commits intomainfrom
dap/serialize

Conversation

@davepacheco
Copy link
Copy Markdown
Collaborator

This PR fixes a few problems that I ran into when I went to go test #7978 using reconfigurator-exec-unsafe.

My basic plan was:

  1. grab a test racklette (dublin)
  2. use omdb reconfigurator export to save the reconfigurator state to a file
  3. use reconfigurator-cli to export the system's target blueprint from that file
  4. hand-edit the blueprint to include a PendingMgsUpdates that contains an example update to one of the sled SPs
  5. use reconfigurator-exec-unsafe to execute the blueprint

The main problem I ran into was constructing a serialized PendingMgsUpdates containing the update that I wanted to do. Since we don't have reconfigurator-cli support for this property yet, I wrote some code to do this. (I'll put that into a comment below for posterity.) But when I went to serialize it I got:

error: serializing blueprint: key must be a string

The problem is that PendingMgsUpdates didn't impl Serialize correctly. The derived impl tries to serialize a map whose keys (Arc<BaseboardId>) are objects. This is not allowed in JSON.

This PR impls Serialize/Deserialize by hand as a sequence of PendingMgsUpdate objects. This is closer to the spirit of IdMap anyway.

This also fixes a few small things in reconfigurator-exec-unsafe that made it a little harder to test.

With this PR, I'm able to complete the above test and verify that using reconfigurator-exec-unsafe, we complete an SP update through blueprint execution 🎉 (This still won't work in Nexus until we do the database implementation of this field.)

@davepacheco
Copy link
Copy Markdown
Collaborator Author

I should add: I believe there's zero risk with this change because there is no code in Omicron today that constructs a blueprint with a non-empty PendingMgsUpdates. Also, Serialize/Deserialize are only used for the internal API (for omdb) and for omdb reconfigurator export.

Besides what's in this PR, I also needed these two patches for my testing. The first patch makes reconfigurator-exec-unsafe ignore the fact that dublin's database schema is older than my copy of the tool expects:

diff --git a/dev-tools/reconfigurator-exec-unsafe/src/main.rs b/dev-tools/reconfigurator-exec-unsafe/src/main.rs
index cba59993e..c8abfec5e 100644
--- a/dev-tools/reconfigurator-exec-unsafe/src/main.rs
+++ b/dev-tools/reconfigurator-exec-unsafe/src/main.rs
@@ -109,12 +109,7 @@ impl ReconfiguratorExec {
 
         info!(&log, "setting up database pool");
         let pool = Arc::new(db::Pool::new(&log, &qorb_resolver));
-        let datastore = Arc::new(
-            DataStore::new_failfast(&log, pool)
-                .await
-                .context("creating datastore")?,
-        );
-
+        let datastore = Arc::new(DataStore::new_unchecked(log.clone(), pool));
         let result = self.do_exec(log, &datastore, &qorb_resolver).await;
         datastore.terminate().await;
         result

This is dangerous in general but harmless in my case because this isn't doing anything with the changed part of the schema.

And this patch is what I used to have reconfigurator-cli's load-example generate a system with a PendingMgsUpdates value that I wanted:

diff --git a/nexus/reconfigurator/planning/src/blueprint_builder/builder.rs b/nexus/reconfigurator/planning/src/blueprint_builder/builder.rs
index 4780a2aef..7ce561cdf 100644
--- a/nexus/reconfigurator/planning/src/blueprint_builder/builder.rs
+++ b/nexus/reconfigurator/planning/src/blueprint_builder/builder.rs
@@ -83,7 +83,16 @@ use thiserror::Error;
 
 use super::ClickhouseZonesThatShouldBeRunning;
 use super::clickhouse::ClickhouseAllocator;
+use gateway_client::types::SpType;
+use nexus_types::deployment::ExpectedVersion;
+use nexus_types::deployment::PendingMgsUpdate;
+use nexus_types::deployment::PendingMgsUpdateDetails;
 use nexus_types::deployment::PendingMgsUpdates;
+use nexus_types::inventory::BaseboardId;
+use std::sync::Arc;
+use tufaceous_artifact::ArtifactHashId;
+use tufaceous_artifact::ArtifactKind;
+use tufaceous_artifact::KnownArtifactKind;
 
 /// Errors encountered while assembling blueprints
 #[derive(Debug, Error)]
@@ -452,10 +461,33 @@ impl<'a> BlueprintBuilder<'a> {
             .collect::<BTreeMap<_, _>>();
         let num_sleds = sleds.len();
 
+        let mut pending_mgs_updates = PendingMgsUpdates::new();
+        let baseboard_id = Arc::new(BaseboardId {
+            part_number: String::from("913-0000019"),
+            serial_number: String::from("BRM27230037"),
+        });
+        let artifact_hash_id = ArtifactHashId {
+            kind: ArtifactKind::from_known(KnownArtifactKind::GimletSp),
+            hash: "47266ede81e13f5f1e36623ea8dd963842606b783397e4809a9a5f0bda0f8170".parse().unwrap(),
+        };
+        let update = PendingMgsUpdate {
+            baseboard_id,
+            sp_type: SpType::Sled,
+            slot_id: 15,
+            details: PendingMgsUpdateDetails::Sp {
+                expected_active_version: "1.0.36".parse().unwrap(),
+                expected_inactive_version: ExpectedVersion::Version(
+                    "1.0.36".parse().unwrap(),
+                ),
+            },
+            artifact_hash_id,
+            artifact_version: "1.0.34".parse().unwrap(),
+        };
+        pending_mgs_updates.insert(update);
         Blueprint {
             id: rng.next_blueprint(),
             sleds,
-            pending_mgs_updates: PendingMgsUpdates::new(),
+            pending_mgs_updates,
             parent_blueprint_id: None,
             internal_dns_version: Generation::new(),
             external_dns_version: Generation::new(),

and it serializes like this:

{
  "pending_mgs_updates": [
    {
      "baseboard_id": {
        "part_number": "913-0000019",
        "serial_number": "BRM27230037"
      },
      "sp_type": "sled",
      "slot_id": 15,
      "details": {
        "component": "sp",
        "expected_active_version": "1.0.36",
        "expected_inactive_version": {
          "kind": "version",
          "version": "1.0.36"
        }
      },
      "artifact_hash_id": {
        "kind": "gimlet_sp",
        "hash": "47266ede81e13f5f1e36623ea8dd963842606b783397e4809a9a5f0bda0f8170"
      },
      "artifact_version": "1.0.34"
    }
  ],
}

creator: OmicronZoneUuid::from_untyped_uuid(creator),
sender,
mgs_updates,
mgs_updates: mgs_updates.clone(),
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was needed because the driver shuts down when the sender side of its watch channel closes. We don't want that to happen until later, now at L299 (in the new code).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you add a comment here and in drop below?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

creator: OmicronZoneUuid::from_untyped_uuid(creator),
sender,
mgs_updates,
mgs_updates: mgs_updates.clone(),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you add a comment here and in drop below?

{
let mut map = PendingMgsUpdates::new();
while let Some(u) = seq.next_element()? {
map.insert(u);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it legal for there to be duplicates?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. In practice:

  • on the serialize side, this is impossible because IdMap uses a BTreeMap under the hood
  • in JSON, this is possible but the behavior is unspecified
  • on the deserialize side, we will ignore all entries having the same baseboard id except for the last one

I think this is about the same behavior as all the other maps that we have.

Copy link
Copy Markdown
Contributor

@jmpesp jmpesp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fine, but we should add some unit tests 🚀

loop {
let status = status_rx.borrow();
debug!(&log, "MGS update status"; "status" => ?status);
info!(&log, "MGS update status"; "status" => ?status);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should these be changed back to debug?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question but I don't think so. The intent was to print these every few seconds after execution finishes until the updates are complete. Otherwise you don't really know what's going on with the updates.

@davepacheco
Copy link
Copy Markdown
Collaborator Author

Looks fine, but we should add some unit tests 🚀

Agreed on having tests. These will be a lot easier once we do the reconfigurator-cli support.

@davepacheco
Copy link
Copy Markdown
Collaborator Author

Looks fine, but we should add some unit tests 🚀

Agreed on having tests. These will be a lot easier once we do the reconfigurator-cli support.

Well, I realized how easy it would be to add a smoke test, so I just did it 😄

@davepacheco davepacheco enabled auto-merge (squash) April 16, 2025 23:53
@davepacheco davepacheco merged commit 852dfc8 into main Apr 17, 2025
16 checks passed
@davepacheco davepacheco deleted the dap/serialize branch April 17, 2025 00:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants